When to Reset Your Keys: Optimal Timing of Security Updates via Learning
نویسندگان
چکیده
Cybersecurity is increasingly threatened by advanced and persistent attacks. As these attacks are often designed to disable a system (or a critical resource, e.g., a user account) repeatedly, it is crucial for the defender to keep updating its security measures to strike a balance between the risk of being compromised and the cost of security updates. Moreover, these decisions often need to be made with limited and delayed feedback due to the stealthy nature of advanced attacks. In addition to targeted attacks, such an optimal timing policy under incomplete information has broad applications in cybersecurity. Examples include key rotation, password change, application of patches, and virtual machine refreshing. However, rigorous studies of optimal timing are rare. Further, existing solutions typically rely on a pre-defined attack model that is known to the defender, which is often not the case in practice. In this work, we make an initial effort towards achieving optimal timing of security updates in the face of unknown stealthy attacks. We consider a variant of the influential FlipIt game model with asymmetric feedback and unknown attack time distribution, which provides a general model to consecutive security updates. The defender’s problem is then modeled as a time associative bandit problem with dependent arms. We derive upper confidence bound based learning policies that achieve low regret compared with optimal periodic defense strategies that can only be derived when attack time distributions are known. Introduction Malicious attacks are constantly evolving to inflict increasing levels of damage on the nation’s infrastructure systems, cooperate IT systems, and our digital lives. For example, the Advanced Persistent Threat (APT) has become a major concern to cybersecurity in the past few years. APT attacks exhibit two distinguishing behavior patterns (van Dijk et al. 2013) that make them extremely difficult to defend using traditional techniques. First, these attacks are often funded well and persistent. They attack a target system (or a critical resource) periodically with the goal to compromise it completely e.g., by stealing full cryptography keys. Second, the attacks can be highly adaptive. In particular, they often act covertly, e.g., by operating in Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. a “low-and-slow” fashion (Bowers et al. 2014), to avoid immediate detection and obtain long-term advantages. From the defender’s perspective, an effective way to thwart continuous and stealthy attacks is to update its security measures periodically to strike a balance between the risk of being compromised and the cost of updates. The primary challenge, however, is that such decisions must often be made with limited and delayed feedback because of the covert nature of the attacker. In addition to thwarting targeted attacks, such an optimal timing problem with incomplete information is crucial in various cybersecurity scenarios, e.g., key rotation (van Dijk et al. 2013), password changes (Tan and Xia 2016), application of patches (Beattie et al. 2002), and virtual machine refreshing (Juels et al. 2016). For example, Facebook receives approximately 600,000 “compromised logins” from impostors every day (Barnett 2011). An efficient approach to stop these attacks is to ask users to update their passwords when the risk of attack is high. Although time-related tactical security choices have been studied since the cold war era (Blackwell 1949), rigorous study of timing decisions in the face of continuous and stealthy attacks is relatively new. In 2012, in response to an APT attack on it, the RSA lab proposed the FlitIt game, which was one of the first models to study timing decisions under stealthy takeovers. The FlipIt game model abstracts out details about concrete attack and defense operations by focusing on the stealthy and persistent nature of players. The basic model considers two players, each of whom can “flip” the state of a system periodically at any time with a cost. A player only learns the system state when she moves herself. The payoff of a player is defined as the fraction of time when the resource is under its control less the total cost incurred. The FlipIt game captures the stealthy behavior of players in an elegant way by allowing various types of feedback structures. In the basic model where neither player gets any feedback during the game and each move flips the state of the resource instantaneously, it is known that periodic strategies with random starting phases form a pair of best response strategies (van Dijk et al. 2013). As a variant of the basic model, an asymmetric setting is studied in (Laszka, Johnson, and Grossklags 2013) where the defender gets no feedback during the game while the attacker obtains immediate feedback after each defense but incurs a random attack time to take over the resource. In this setting, it is shown in (Laszka, Johnson, and Grossklags 2013) that periodic defense and immediate attack (or no attack) form a pair of best response strategies. However, little is known beyond these two cases. In particular, designing adaptive defense strategies with partial feedback remains an open problem. Although the FlipIt game provides a proper framework to understand the strategic behavior of stealthy takeover, it relies on detailed prior knowledge about the attacker. In particular, it requires parameters such as the amount of time needed to compromise a resource and the unit cost of each attack (or their distributions) to be fixed and known to the defender so that the equilibrium solution can be derived. These parameters limit the scope of the attack model, which, however, can be hard to verify before the game starts. To address this fundamental limitation, we propose to study online learning algorithms that make minimum assumptions about the attacker and learn an optimal defense strategy from the limited feedback obtained during the game. Given the advances in big data analytics and their applications in cybersecurity, it is feasible for the defender to obtain partial feedback even under stealthy attacks. Such a learning approach makes it possible to derive adaptive and robust defense strategies against unknown attacks where the type of the attacker is derived from a fixed but unknown distribution, as well as the more challenging dynamic attacks where the type of the attacker can arbitrarily vary over time. In this work, we make a first effort towards achieving optimal timing of security updates in the face of unknown stealthy attacks. We consider a variant of FlipIt game with asymmetric feedback similar to (Laszka, Johnson, and Grossklags 2013), but with two key differences. First, we consider repeated unknown attacks with attacker’s type sampled from an unknown distribution. Second, we assume that the defender obtains limited feedback about potential attacks at the end of each period. The defender’s goal is to minimize the long-term cumulative loss. Our objective is to derive an adaptive defense policy that has a low regret compared with the optimal periodic defense policy when the attack time distribution is known. A key observation is that the set of defense periods that the defender can choose from are dependent in the sense that the loss from one defense period may reveal the potential loss from other periods, especially shorter ones. Moreover, two defense policies played for the same number of rounds may span different lengths of time, which has to be taken into account when comparing the policies. In this paper, we model the defender’s problem as a time associate stochastic bandit problem with dependent arms, where each arm corresponds to one possible defense period. We derive optimal defense strategies for both the finite-armed bandit setting where the defense periods can only take a finite set of values, and the continuum-armed bandit setting where the defense periods can take any values from a non-empty interval. Our main contributions can be summarized as follows. • We propose a stochastic time associative bandit model for optimal timing of security updates in the face of unknown attacks. Our model captures both the limited feedback about stealthy attacks and the dependence between different defense options. • We derive upper confidence bound (UCB) based policies for time associative bandits with dependent arms. Our policies achieve a regret of O (log(T (K + 1)) +K) for the finite arm case, where T is the number of rounds played and K is the number of arms, and a regret of O(T ) for the continuous arm setting. Our learning model and algorithms are built upon the assumption that the defender can learn from frequent system compromises. This is reasonable for many online systems such as large online social networks and content providers and large public clouds, in which many customers are subject to similar attacks. In this setting, even if a single user is compromised occasionally, the system administrator can pool data collected from multiple users to obtain a reliable estimate quickly. For example, given the large number of attacks towards its users, Facebook can collect data from thousands of incidents of similar compromises in a short time. Our online learning algorithms can be used by Facebook to alert users to update their passwords when necessary. Related Work Time-related tactical security choices have been studied since the cold war era (Blackwell 1949). However, the study of timing decisions in the face of continuous and stealthy attacks is relatively new. In particular, the FlipIt game (van Dijk et al. 2013) and its variants (Laszka, Johnson, and Grossklags 2013; Laszka et al. 2014) are among the few models that study this problem in a rigorous way. However, all of these models assume that the parameters about the attacker are known to the defender at the beginning of the game. A gradient-based Bayesian learning algorithm was recently proposed in (Tan and Xia 2016) for a setting similar to ours, where the failure time was assumed to follow a Weibull distribution with one unknown parameter. In contrast, we consider a general attack time distribution. Multi-armed bandit problems have been extensively studied for both the stochastic setting and the adversarial setting (Bubeck and Cesa-Bianchi ). Many variants of bandit models have been considered including bandits with side observations (Caron et al. 2012; Buccapatnam, Eryilmaz, and Shroff 2014). In the context of cybersecurity, bandit models have been applied to anomaly detection (Liu, Zhao, and Swami 2013) and stackelberg security games (Balcan et al. 2015). However, the only previous work that studies the time associative bandit model is (György et al. 2007), where the arms are assumed to be mutually independent. In contrast, we propose to model the optimal timing problem in cybersecurity as a time associative bandit problem with dependent arms and study algorithms that can exploit side-observations to improve performance. Model We consider the following variant of the FlipIt game (van Dijk et al. 2013) with two players, a de-
منابع مشابه
Comparing Bandwidth and Self-control Modeling on Learning a Sequential Timing Task
Modeling is a process which the observer sees another person's behavior and adapts his/her behavior with that which is the result of interaction. The aim of present study was to investigate and compare effectiveness of bandwidth modeling and self-control modeling on performance and learning of a sequential timing task. So two groups of bandwidth and self-control were compared. The task was pres...
متن کاملThe effect of self-control feedback on the learning of generalized motor program and parameters during physical and observational practice
The purpose of this study was to examine the effect of self-control feedback on the learningof generalized motor program and parameters during physical and observational practice. Participants (n=90) were randomly assigned to physical and observational practice (self-control, yoked and instructor KR) groups. They practiced a sequential timing task. The task required participants to press four k...
متن کاملOn the Optimal Frequency and Timing of Control Points in a Project’s Life Cycle
The dynamic nature of projects and the fact that they are carried out in changing environments, justify the need for their periodic monitoring and control. Collection of information about the performance of projects at control points costs money. The corrective actions that may need to be taken to bring the project in line with the plan also costs money. On the other hand, penalties are usually...
متن کاملWhen to Invest in Security? Empirical Evidence and a Game-Theoretic Approach for Time-Based Security
Games of timing aim to determine the optimal defense against a strategic attacker who has the technical capability to breach a system in a stealthy fashion. Key questions arising are when the attack takes place, and when a defensive move should be initiated to reset the system resource to a known safe state. In our work, we study a more complex scenario called Time-Based Security in which we co...
متن کاملFamily security in the shadow of adjustment the patient's power of couples
Abstract: Objective: This study examined the couple took power in the relationship. Although Islam has entrusted the overall management of life to men, and its purpose is to enter about religious and customary in this field, but the chaff power takes a couple abnormal and pathological. How about that power couples to occur? And community and social gender learning how to influence couples? A...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017